Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast

Identifieur interne : 000063 ( France/Analysis ); précédent : 000062; suivant : 000064

Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast

Auteurs : Hervé Bredin [France] ; Johann Poignant [France]

Source :

RBID : Hal:hal-00953095

Abstract

Most state-of-the-art approaches address speaker diariza- tion as a hierarchical agglomerative clustering problem in the audio domain. In this paper, we propose to revisit one of them: speech turns clustering based on the Bayesian Information Cri- terion (a.k.a. BIC clustering). First, we show how to model it as an integer linear programming (ILP) problem. Its resolu- tion leads to the same overall diarization error rate as standard BIC clustering but generates significantly purer speaker clus- ters. Then, we describe how this approach can easily be ex- tended to the audiovisual domain and TV broadcast in particu- lar. The straightforward integration of detected overlaid names (used to introduce guests or journalists, and obtained via video OCR) into a multimodal ILP problem yields significantly better speaker diarization results. Finally, we explain how this novel paradigm can incidentally be used for unsupervised speaker identification (i.e. not relying on any prior acoustic speaker models). Experiments on the REPERE TV broadcast corpus show that it achieves performance close to that of an oracle ca- pable of identifying any speaker as long as their name appears on screen at least once in the video.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:hal-00953095

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast</title>
<author>
<name sortKey="Bredin, Herve" sort="Bredin, Herve" uniqKey="Bredin H" first="Hervé" last="Bredin">Hervé Bredin</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-202" status="OLD">
<orgName>Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [Orsay]</orgName>
<orgName type="acronym">LIMSI</orgName>
<desc>
<address>
<addrLine>Université Paris Sud (Paris XI) Bât. 508 BP 133 91403 ORSAY CEDEX</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.limsi.fr/</ref>
</desc>
<listRelation>
<relation name="UPR3251" active="#struct-441569" type="direct"></relation>
<relation active="#struct-92966" type="direct"></relation>
<relation active="#struct-93591" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="UPR3251" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-92966" type="direct">
<org type="institution" xml:id="struct-92966" status="VALID">
<orgName>Université Paris-Sud - Paris 11</orgName>
<orgName type="acronym">UP11</orgName>
<desc>
<address>
<addrLine>Bâtiment 300 - 91405 Orsay cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.u-psud.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="VALID">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Poignant, Johann" sort="Poignant, Johann" uniqKey="Poignant J" first="Johann" last="Poignant">Johann Poignant</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-49640" status="VALID">
<orgName> Modélisation et Recherche d’Information Multimédia [Grenoble]</orgName>
<orgName type="acronym">MRIM</orgName>
<desc>
<address>
<addrLine>110 av. de la Chimie - Domaine Universitaire - BP 53 - 38041 Grenoble - cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://equipes-lig.imag.fr/mrim</ref>
</desc>
<listRelation>
<relation active="#struct-3886" type="direct"></relation>
<relation active="#struct-51016" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300275" type="direct"></relation>
<relation name="UMR5217" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-3886" type="direct">
<org type="institution" xml:id="struct-3886" status="OLD">
<idno type="IdRef">02640432X</idno>
<orgName>Université Pierre Mendès France - Grenoble 2</orgName>
<orgName type="acronym">UPMF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 47 - 38040 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-51016" type="direct">
<org type="institution" xml:id="struct-51016" status="OLD">
<idno type="IdRef">026404796</idno>
<orgName>Université Joseph Fourier - Grenoble 1</orgName>
<orgName type="acronym">UJF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 53 - 38041 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ujf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="direct">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300275" type="direct">
<org type="institution" xml:id="struct-300275" status="OLD">
<idno type="IdRef">026388804</idno>
<orgName>Institut National Polytechnique de Grenoble </orgName>
<orgName type="acronym">INPG</orgName>
<date type="end">2006-12-31</date>
<desc>
<address>
<addrLine>46 avenue Félix Viallet 38031 Grenoble Cedex 1</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5217" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Grenoble</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Grenoble</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-00953095</idno>
<idno type="halId">hal-00953095</idno>
<idno type="halUri">https://hal.inria.fr/hal-00953095</idno>
<idno type="url">https://hal.inria.fr/hal-00953095</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Hal/Corpus">000068</idno>
<idno type="wicri:Area/Hal/Curation">000068</idno>
<idno type="wicri:Area/Hal/Checkpoint">000053</idno>
<idno type="wicri:Area/Main/Merge">000218</idno>
<idno type="wicri:Area/Main/Curation">000214</idno>
<idno type="wicri:Area/Main/Exploration">000214</idno>
<idno type="wicri:Area/France/Extraction">000063</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast</title>
<author>
<name sortKey="Bredin, Herve" sort="Bredin, Herve" uniqKey="Bredin H" first="Hervé" last="Bredin">Hervé Bredin</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-202" status="OLD">
<orgName>Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [Orsay]</orgName>
<orgName type="acronym">LIMSI</orgName>
<desc>
<address>
<addrLine>Université Paris Sud (Paris XI) Bât. 508 BP 133 91403 ORSAY CEDEX</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.limsi.fr/</ref>
</desc>
<listRelation>
<relation name="UPR3251" active="#struct-441569" type="direct"></relation>
<relation active="#struct-92966" type="direct"></relation>
<relation active="#struct-93591" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="UPR3251" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-92966" type="direct">
<org type="institution" xml:id="struct-92966" status="VALID">
<orgName>Université Paris-Sud - Paris 11</orgName>
<orgName type="acronym">UP11</orgName>
<desc>
<address>
<addrLine>Bâtiment 300 - 91405 Orsay cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.u-psud.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="VALID">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Poignant, Johann" sort="Poignant, Johann" uniqKey="Poignant J" first="Johann" last="Poignant">Johann Poignant</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-49640" status="VALID">
<orgName> Modélisation et Recherche d’Information Multimédia [Grenoble]</orgName>
<orgName type="acronym">MRIM</orgName>
<desc>
<address>
<addrLine>110 av. de la Chimie - Domaine Universitaire - BP 53 - 38041 Grenoble - cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://equipes-lig.imag.fr/mrim</ref>
</desc>
<listRelation>
<relation active="#struct-3886" type="direct"></relation>
<relation active="#struct-51016" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300275" type="direct"></relation>
<relation name="UMR5217" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-3886" type="direct">
<org type="institution" xml:id="struct-3886" status="OLD">
<idno type="IdRef">02640432X</idno>
<orgName>Université Pierre Mendès France - Grenoble 2</orgName>
<orgName type="acronym">UPMF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 47 - 38040 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-51016" type="direct">
<org type="institution" xml:id="struct-51016" status="OLD">
<idno type="IdRef">026404796</idno>
<orgName>Université Joseph Fourier - Grenoble 1</orgName>
<orgName type="acronym">UJF</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 53 - 38041 Grenoble Cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ujf-grenoble.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="direct">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300275" type="direct">
<org type="institution" xml:id="struct-300275" status="OLD">
<idno type="IdRef">026388804</idno>
<orgName>Institut National Polytechnique de Grenoble </orgName>
<orgName type="acronym">INPG</orgName>
<date type="end">2006-12-31</date>
<desc>
<address>
<addrLine>46 avenue Félix Viallet 38031 Grenoble Cedex 1</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5217" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Grenoble</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Joseph Fourier</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Grenoble</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Most state-of-the-art approaches address speaker diariza- tion as a hierarchical agglomerative clustering problem in the audio domain. In this paper, we propose to revisit one of them: speech turns clustering based on the Bayesian Information Cri- terion (a.k.a. BIC clustering). First, we show how to model it as an integer linear programming (ILP) problem. Its resolu- tion leads to the same overall diarization error rate as standard BIC clustering but generates significantly purer speaker clus- ters. Then, we describe how this approach can easily be ex- tended to the audiovisual domain and TV broadcast in particu- lar. The straightforward integration of detected overlaid names (used to introduce guests or journalists, and obtained via video OCR) into a multimodal ILP problem yields significantly better speaker diarization results. Finally, we explain how this novel paradigm can incidentally be used for unsupervised speaker identification (i.e. not relying on any prior acoustic speaker models). Experiments on the REPERE TV broadcast corpus show that it achieves performance close to that of an oracle ca- pable of identifying any speaker as long as their name appears on screen at least once in the video.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Auvergne-Rhône-Alpes</li>
<li>Rhône-Alpes</li>
</region>
<settlement>
<li>Grenoble</li>
</settlement>
<orgName>
<li>Université Joseph Fourier</li>
<li>Université de Grenoble</li>
</orgName>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Bredin, Herve" sort="Bredin, Herve" uniqKey="Bredin H" first="Hervé" last="Bredin">Hervé Bredin</name>
</noRegion>
<name sortKey="Poignant, Johann" sort="Poignant, Johann" uniqKey="Poignant J" first="Johann" last="Poignant">Johann Poignant</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000063 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000063 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     Hal:hal-00953095
   |texte=   Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024